A data-driven method of pile-up correction for the substructure of massive jets 
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We describe a method to measure and subtract the incoherent component of energy flow aris- 
ing from muhiple interactions from jet shape/substructure observables of ultra-massive jets. The 
amount subtracted is a function of the jet shape variable of interest and not a universal property. 
Such a correction is expected to significantly reduce any bias in the corresponding distributions 
generated by the presence of multiple interactions, and to improve measurement resolution. Since 
in our method the correction is obtained from the data, it is not subject to uncertainties coming 
from the use of theoretical calculations and/or Monte Carlo event generators. We derive our correc- 
tion method for the jet mass, angularity and planar flow. We find these corrections to be in good 
agreement with data on massive jets observed by the CDF collaboration. Finally, we comment on 
the linkage with the concept of jet area and jet mass area. 



I. INTRODUCTION 

Incoherent processes in high-energy hadron-hadron 
coUisions hke muhiple interactions, the underlying event 
in a high transverse momentum (pt) scatter or instru- 
mental effects may blur the picture when various hard 
processes are under study. This is especially important 
for studies of high px ultra-massive jets: Though the 
jet substructure can be computed perturbatively with 
reasonable accuracy, these incoherent processes lead to 
reductions in the resolution and sensitivity of various 
searches for new physics T^^. The existing correction 
methods rely mostly on the Monte Carlo simulation of 
the underlying event and additional (pile-up) interactions 
at high instantaneous luminosities. 

We propose a data-driven method that enables one 
to measure the effect of incoherent contributions to jet 
substructure variables and to get an analytical expres- 
sion for the functional form of the correction. Using this 
method one has available jet- variable dependent correc- 
tions rather than global ones. Thus, the measured sub- 
structure distribution should correspond to that arising 
from the hard part of the event. The correction tech- 
nique can be applied simultaneously to several jet-shape 
variables (of a fixed large mass) leading to improved 
resolution of the relevant jet variables and to an in- 
crease in the sensitivity to new physics signals. The pro- 
posed method has been successfully demonstrated with 
CDF data collected in proton-antiproton collisions at 
= 1.96 TeV [7]. 

The susceptibility of modern jet algorithms that are 
infra-red and coUinear (IRC) safe to soft and weakly cor- 
related contributions can be elegantly described by the 
concept of a jet area Such contributions may shift 

the value of any given jet variable. When studying the 
substructure of highly boosted ultra-massive jets of par- 
ticular interest is the corrections to jet variables as a 
function of its value on a jet-by-jet basis, which is inde- 
pendent of the average global shift to its momenta. The 
proposed method is based on a data-driven measurement 



of the size and effect of the incoherent component of en- 
ergy flow for a given jet. 

The actual measurement uses the method [5] of em- 
ploying the dominant dijet topology of high px jets pro- 
duced via QCD interactions and measuring the energy 
deposition in a fixed size cone rotated by 90° relative to 
the dijet axis (as is used in a recent CDF study [3 E])- 
Our technique can be applied to both high and low in- 
stantaneous luminosity (L) regimes such as those experi- 
enced or expected at the Tevatron and the Large Hadron 
Collider. In the examples we use to illustrate this pro- 
cedure, the average dependence of the corrections on the 
relevant variables has been determined by the CDF col- 
laboration using a large sample of high px jets. This 
technique gives the actual correction to be applied to the 
relevant jet variables. However, in practice, the correc- 
tions are parametrized based on theoretical expectations, 
as we discuss below. 

The size of the incoherent effects can be also extracted 
using sophisticated methods that have been studied in [4]- 
|6] and that are incorporated within the FastJet frame- 
work [TUJITT]. In the latter case, assuming a diffuse soft 
component one can determine a correction by mea- 
suring the energy density of the soft component in the 
event, multiplying it by the active jet area and then es- 
timating the corresponding shift in the value of the jet 
shape variable under study. The case of passive area pro- 
ceeds in a similar manner and is further discussed below 
in the context of jet mass area |12) . 

In the following, we first describe the general proce- 
dure, outline the expected corrections for mass, angular- 
ity and planar flow, and illustrate how the CDF data 
confirm our method predictions. We then comment on 
the relationship of our technique to the concept of jet 
mass area. 
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II. THE GENERAL PRESCRIPTION 

Consider a jet-shape variable X that characterizes 
the energy flow within ultra-massive highly boosted jets 
whose transverse momenta and invariant mass are in a 
given predefined range. Below we focus on the high jet 
mass region (> 70 GeV) since the QCD contribution is 
better controlled there, and since such massive jets are 
of special importance for various new physics searches. 

We evaluate the variation of X under the additional 
incoherent component of radiation 



AX 



_ dx I 

P.j,rnj ~ Qjj^ J \ p.i,mj 



Smj 



E 



dX 



6E, , (1) 



where pj is the jet momenta (or transverse momenta for 
hadronic collider) and the summation J^ieR corresponds 
to the sum of the energy of calorimeter cells (Ei) inside 
a jet with a size-parameter R. The summation X]iei?9o° 
corresponds to the sum of energy deposited in a cone of 
area oq = ttR^ whose axis is rotated by 90° in (p direction. 
It is assumed here that X is measured in the leading jet 
and that the incoherent energy deposition inside the lead- 
ing jet is equal to that observed, at least on average, to 
the cone perpendicular in azimuth: J2ieR—J2i£B.°°° ■ 
is worth mentioning here again that the method is inde- 
pendent of the way the additional incoherent component 
of energy is measured. This procedure will work for any 
IRC jet algorithm and as long as i?^ <C 1 (as we work to 
leading order). 

Generally, the correction to X (AX) can be written as 
a function of X itself for the variables we are interested 
in, so that 

AX{p.j,mj) = f{X,p.,,mj)6m^j®g{X,pj,m.,)6E,{2) 

where f{X,pj,mj) and g{X,pj,mj) are analytic func- 
tions that are computed below for few jet- variables, and 
the multiplicative coefficients for Sm^j and SE can be de- 
termined from the data. 

The correction procedure for jet mass, angularity and 
planar flow are derived below. The procedure gives rise 
to concrete predictions of the form of the corrections 
{AX{X,pj,m)) as a function of the value of the jet- 
variable. Because the corrections can be determined di- 
rectly from the data, their uncertainties are relatively 
small and can be controlled experimentally. 



III. SUBTRACTION METHOD FOR JET MASS 



To estimate the right-hand-side (RHS) of this relation 
note that the jet mass squared is given by rrij = 

{J2ieR Pi^^ ^ ^^'l so the correction to it is 



Amj 



PJ 



(4) 



ieR.^o° 



Since to leading order Amj ~ 2mjSmj we find that the 
leading order correction to the jet mass is given by (for 
a related discussion see |B]) 
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ESmf 
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We thus find that for a fixed px the correction to the 
jet mass is proportional to the inverse of that mass and 
the coefficient can be fit from the data. This is in agree- 
ment with the CDF results for the Midpoint, anti-k-p 
[TB] or Midpoint/SC (Midpoint using search cones) jet 
algorithms In this case, data were analyzed sepa- 
rately for events with one primary vertex (A^vtx = 1) 
and for events with multiple interactions (A'vtx > 1) (i-e. 
single and multiple interactions events). The N^tx > 1 
corrections behave as expected from the analysis above. 
Furthermore, the iVvtx = 1 corrections show the aver- 
age effect of the underlying event in the hard scatter on 
the jet mass, but may not accurately represent the true 
behaviour of the soft component given that our calcu- 
lation assumes it behaves incoherently. The difference 
between the two corrections separates out the purely in- 
coherent component, and gives further confirmation that 
the multiple interactions act purely incoherently, scaling 
with both the level of multiple interactions and having 
the appropriate dependence on the jet radius. 

This is shown in Fig. [l] from [7] which includes both 
the PYTHIA 6.216 Monte Carlo (MC) prediction [14 (in- 
cluding full detector simulation) and the fit to the func- 
tional dependence given in Eq. ([5|. The vertical axis 
corresponds to the average change in the jet mass upon 
adding the contributions from the 90° cone as a function 
of the measured jet mass (the horizontal axis). We do 
not expect the MC to provide a precise determination 
of the overall scale of the change but rather give insight 
towards the shape of the correction, since the statistics 
is much less of an issue in this case. In particular, the 
Nvtx > 1 contribution will be much less given that this 
MC calculation assumed only ~ 0.5 interactions in addi- 
tion to the hard scatter per event. The reader may note 
that the plot also includes the low mass region which is 
beyond the focus of the present study. 



This case is a simplification of the general case de- 
scribed by Eq. ([T]), since X is one of the two variables 
we normally control independently. Nevertheless, in or- 
der to demonstrate the procedure we analyze it in some 
length. The correction to the jet mass is: 
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IV. SUBTRACTION METHOD FOR 
ANGULARITY 

The small angle expression for angularity is [111 US] 
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FIG. 1: On the upper panel we show the CDF data and a fit 
based on the relation derived in Eq. ([5|. The data collected 
had on average ~ 3 multiple interactions per event (includ- 
ing the hard interaction). On the lower panel we show the 
corresponding MC predictions including full detector simula- 
tion [7]. 



where a < 2 is required for IRC safety. Recently, the 
a = —2 distribution was measured by CDF for jets with 
Pt > 400 GeV and mass in the window 90 < mj < 
120 GeV, hence we will focus on this specific value of an- 
gularity (the procedure below should work, in principle, 
for arbitrary value of a, however, clearly a = is spe- 
cial since it is not independent of the jet mass variable.). 
To leading order, the correction from incoherent energy 
deposition is given by 



At„ = 



dmj 



Smj + 



dE,, 



5E,, 
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E 
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where we use Eq. ([4| to simplify the RHS. We note that 
Tg corresponds to the jet angularity before the correc- 
tion. We also note that the two types of contributions 
should be added incoherently in quadrature as indicated 
by the © symbol. Eq. ([7| implies that for a fixed jet mass 



(as is often applied in new physics searches) the leading 
order correction to the angularity consists of two terms: 
a constant and a term proportional to the value of the 
angularity itself. 

Let us denote by R12 the ratio between the second and 
the first terms in the parenthesis of the RHS of Eq. ([t]). 



Ri 



2'^ mjE 



ri PJ 
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The above ratio can be estimated by taking the minimum 
and maximum value for the angularity, ^^J^"""- 
which may be obtained from the leading order pertur- 
bative QCD result 



mj 
2p7 



2^-1 Jl-a']^ (9) 
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We therefore find that the ratio between the minimum 
and maximum contributions i^R^™'™^^^ is: 



mm \ 
12 j 



{R 
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2 9: 



PJ 

2 97" R" - 2 



mj 
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where on the RHS we have used the approximation 9i 
R for the most important contributions. 

The interesting angularity distributions, relevant to 
highly boosted massive jets, are those with negative 
a US] which emphasize the radiation towards the edge 
of the cone. Consequently, we find that over the inter- 
esting range of parameters the constant corection term 
dominates with some subdominant linear contribution 
towards (r/) . We also find that in general the rela- 
tive correction to angularity is small 



2m 



J 



E 



Sm^ 

n2 



< > ' ^ ^ ^ « 1. 

mj 

(10) 

Analysis of the expected corrections at CDF shows that 
for PT > 400 GeV, R = 0.7 and mj - 100 GeV then 
^ < 2 X 4GeV/100GeV = 0(8%), which is in a good 
agreement with the data [5]. The measured correction, 
the PYTHIA 6.216 Monte Carlo (MC) prediction (in- 
cluding full detector simulation) and the fit to the func- 
tional dependence given by Eq. ([T]) are shown in Fig. [2] 
[7]. The vertical axis corresponds to the change in the 
angularity upon adding the contributions from the 90° 
cone as a function of the measured angularity (the hor- 
izontal axis). The small number of events after having 
imposed the high mass requirement does not allow us to 
separate out contributions from single interaction events 
and events with multiple interactions, as the data is dom- 
inated by A'vtx > 1 events. The form of the distribution 
is consistent with the prediction. 
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FIG. 2: On the upper panel we show the CDF data and a 
fit based on the relation derived in Eq. ([7|. On the lower 
panel we show the corresponding MC predictions including 
full detector simulation [7]. 



V. SUBTRACTION METHOD FOR PLANAR 
FLOW 

To define the planar flow, Pf [H [H [H], we first 
construct, for a given jet, a 2 x 2 matrix Ie 



jkl 



mj 



Pi,k Pi,i 



(11) 



where pi^k is the fc*'* component of the i*'' particle's 
transverse momentum relative to the jet momentum axis. 
We point out that at small angles 7^, corresponds to a 
straightforward generalization of tq, but promoted to a 
two-dimensional tensor 
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2m J ^ ' ' 
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We shall return to this point. Given 7.^,, we define Pf 
for that jet as 



Pf = 4 



det(lE) _ 4A1A2 



tr(lE)2 (Ai + A2)2' 
where A 1.2 are the eigenvalues of Ie- 



(13) 



Ie is a real symmetric matrix, so without loss of gener- 
ality it can be expanded as a sum of three basis matrices 



Ie = cto + Px (Jx + Pz <Jz 



(14) 



where ctq = I2/V2 (I2 is a unit matrix), the cor- 

responding Pauli matrices and we use the normalization 
tr {(Jiij) = Sij such that the CiS form an orthonormal ba- 
sis; finally, the piS are real numbers and the usefulness of 
the analogy with a two-|-one dimensional Lorentz group 
become clear since Pf is now given by 
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7i 



1 - Pi (15) 



with pf = p1 + pI- Let us first consider the contribution 
to Pf from a single calorimeter cell. It satisfies the "null 
energy" condition of a massless particle (pj)^ — {p})'^ = 
where this is independent of the chosen frame in which 
Iw is calculated. Note that this is the first point where 
our result deviates from a generic trivial description of 
symmetric real matrices. Thus Pf actually corresponds 
to one over the boost factor for a system consisting of 
a set of massless particles in three dimensions, or to the 
ratio of the invariant mass of a set of " massless particles" 
to their square of sum of energies. 

Let us find the leading order correction due to incoher- 
ent energy depositions 
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dpn 

2 r 



dPf 2 

Spo + -7] — Spi = — {Pjjpo ~ PisSpi 
dpi Pa 
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{l-Pf)6po-^l-PfSp, 



(16) 



In order to obtain the value of po in terms of observables 
we use Eq. 



Po 



V2 



(17) 



While To is a simple function of the jet mass and mo- 
menta, as explicitly obtained when evaluating the jet 
mass from its four momenta (assuming mj ^ Pj and 
i?< 1) 
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mj 
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We thus obtain the final and simple result for the planar 
flow correction. 



APf = 



V2Pj 
m,j 



{l^Pf)5pQ®^l-Pf6p, .(19) 



Let us estimate what is the expected size of (5po,i- Since 
the correction from the incoherent radiation is random 
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we generally expect 5pi ^ 5po. Using Eq. ^ and (18 1, 
we find 



Smj 
V2Pj 



(20) 



The largest correction is expected for Pf ^ which is 
roughly given by 



Apr 



V2Pj 
nij 



5pl + 5pl^V2 



Smj 
mj 



(21) 



For the CDF data we find APf < 7% for mj ~ 100 GeV. 
The measured correction, the MC prediction (including 
full detector simulation) and the fit to the functional de- 
pendence given in Eq. ( 19 1 is shown in Fig. |3] taken from 
the CDF data [7]. The vertical axis corresponds to the 
change in the observed planar flow as a function of the 
planar flow. As in the angularity case, contributions from 
single-vertex events are not separated given their small 
number. The shape and normalization of the distribution 
is consistent with the prediction. 
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FIG. 3: On the upper panel we show the CDF data and a 
fit based on the relation derived in Eq. (19 1. On the lower 



panel we show the corresponding MC predictions including 
full detector simulation [7|. 



VI. RELATION WITH JET AREAS 

Recently, the concept of "jet area" was introduced [4] 
as a way of understanding the behaviour of jet observ- 
ables in high instantaneous luminosity environments. It 
was shown that once the jet's size becomes dynamical, as 
with modern IRC safe jet algorithms, this concept turns 
out to be useful when assessing the susceptibility to inco- 
herent energy contributions of various jet-variable mea- 
surements. Our emphasize here is slightly different, as 
in [4j we focus on applying data-driven corrections to jet- 
variable distributions over a large range of instantaneous 
luminosities, but we are also interested to get a semi- 
analytical understanding of the possible shape and size 
of the correction for each of the substructure variables. 
It is interesting to briefly mention the correspondence 
with the jet area concept, in particular in the context 
of the recent study of the "mass area" [T^]. Our aim is 
two-fold: First, we show that knowing the jet mass and 
other shape variables such as angularity allows one to 
more precisely determine the jet mass area (and possibly 
other jet shape areas). Second, we argue that in the re- 
gion of interest, the difference between jet mass area and 
jet area (of massless QCD events) is small, which implies 
that our method can be easily adapted by using a global 
extraction of the median energy density from data. 

We demonstrate our points explicitly using studies of 
the Midpoint and anti-k^ jet algorithms performed by 
the CDF collaboration [7] (the Midpoint results are es- 
sentially identical to those obtained with the SISCone 
algorithm [20], as expected since the two algorithms use 
similar split and merge procedure). However, it is trivial 
to see that the same conclusions also applied to other jet 
algorithms. In the following we consider the "passive jet 
area" concept, where analytic results can be obtained. 
We focus on the the region with high degree of coUi- 
mation for ultra massive jets, defined as ^ 1 where 
e = m/{pTR)- It is assumed that the boosted jet con- 
sists of two partonic decay products of a heavy particle 
of mass m, which are well contained inside the jet (as is 
now qualitatively established by the CDF study of an- 
gularity [7], this assumption holds also for QCD massive 
jets, basically doing the measurement for a fixed e). It 
is useful to define a° = ttR^ as the naive jet area of ra- 
dius R, and following the definitions of [12] we define A12 
as the rapidity-azimuth difference between two daughter 
particles, x — A12/R and z — Tii\a.{pj'i , pj'i) / pT in order 
to characterize the primary daughter particles. 

We find the following relation between x and z (assum- 
ing < 1) 



(22) 



z(l-z)' 

and for later usage denote zi{x = 1) = e^(l+e^) 4-0(6^) . 
Let us begin with discussing the SISCone jet finder. 
In this case one can minimize the area of the boosted 
jet by requiring the two daughter partons to be con- 
tained in a single jet. This is satisfied provided that 



6 



1 < X < Xc = 1/(1 — z) [12]. The left inequality im- 
plies that < z < zi. On the other hand, maximiz- 
ing the boosted jet area is achieved when the jet is the 
union of the three cones (around the mother and two 
daughter particles). This implies that Xc > x, namely, 
z > e^{l — e^) + 0{e^). Since e is by construction very 
small, z cannot be changed significantly and leads, as 
anticipated, to small differences between the jet mass 
area and that of low mass jets analyzed in [2]. For 
the anti-kr algorithm, two interesting cases are found: 
(1) for 1/(1 + z) < X < 1 the jet area is bigger then 
a°, and solutions are found for 8e^ < 1, implying that 
zi < z <e'^{l + 3e2) +0{e^). (2) ioi 1< x < Xc the jet 
area is smallerer then a°, and this coincides with the case 
analyzed above for SIS Cone again implying that for mas- 
sive boosted coUimated jets the expected deviation from 
the low mass jet area is small. Thus, when studying high 
mass collimated jets one can approximately use either 
the 90° cone method or derive the correction from the 
product of the jet area and the energy density following 
the prescription of [4]. 

Finally, we would like to mention that under the two- 
body approximation [16] , for a fixed momentum and mass 
the two-body jet's energy flow is fully characterized by a 
single continuous parameter. The asymmetry parameter 
z is a simple function of the angularity or the soft particle 
distance from the jet axis [T7] . It implies that given a jet 
mass and angularity one can extract the parameter x in 
Eq. (|22|)), fully determining the jet mass area as defined 
in [12,. 

For example expanding the angularity variable away 
from the symmetric configuration (z <^ 1/2) one finds 

--[-y) M ' (23) 

which, as expected, shows that as the angularity (which is 
supported by radiation towards the cone edge) increases 
so does the asymmetry (z 0). The distribution of 
boosted massive jets originated both from QCD and mas- 
sive particles (with 2-body decay) peak around the sym- 
metric configuration [TB] (around z = 1/2, unlike what is 
sometimes mentioned in the literature). Therefore, it is 
useful to show the relation between z and angularity in 
this region as well 



where, as expected we see that as the angularity de- 
parts from its minimal value the asymmetry parameter 
decreases from its maximum value accordingly. We em- 
phasize again that the recent CDF study indeed qualita- 
tively confirms the two body descriptions of massive jets 
and the peak around r™™ and a drop for larger values 
is clearly observed [71. It implies that one can further 
sharpen the extraction of the jet area via a measurement 
of its angularity. 

VII. CONCLUSIONS 



To conclude we have provided a formalism to deter- 
mine and take into account the effects due to incoherent 
(and approximately incoherent) radiation to various jet- 
variable distributions. We showed that the incoherent 
radiation induces jet-shape-dependent corrections, and 
showed how they can be calculated from collision data 
for jet mass, angularity and planar fiow. We provided 
an analytic form for the corrections to these variables. 
These predictions have been supported by MC studies 
and have been verified by results from the CDF collabo- 
ration [7]. Finally we also commented on the relation of 
our method with the concept of jet area. 
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